Speaker normalization for audio-visual articulation training

نویسندگان

  • Marcel Ogner
  • Zdravko Kacic
چکیده

The paper describes formant based speaker normalization method suitable for speech visualization and articulation training systems. The method estimates the error function obtained from speaker formant characteristics for a given vowel. Estimated error function gives information for critical band filter shifting on mel-warped frequency scale. The paper also describes accurate technique for formant tracking.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HMM-based visual speech recognition using intensity and location normalization

This paper describes intensity and location normalization techniques for improving the performance of visual speech recognizers used in audio-visual speech recognition. For auditory speech recognition, there exist many methods for dealing with channel characteristics and speaker individualities, e.g., CMN (cepstral mean normalization), SAT (speaker adaptive training). We present two techniques ...

متن کامل

Speaker adaptation for audio-visual speech recognition

In this paper, speaker adaptation is investigated for audiovisual automatic speech recognition (ASR) using the multistream hidden Markov model (HMM). First, audio-only and visual-only HMM parameters are adapted by combining maximum a posteriori and maximum likelihood linear regression adaptation. Subsequently, the audio-visual HMM stream exponents are adapted to better capture the reliability o...

متن کامل

Multi-pose lipreading and audio-visual speech recognition

In this article, we study the adaptation of visual and audio-visual speech recognition systems to non-ideal visual conditions. We focus on overcoming the effects of a changing pose of the speaker, a problem encountered in natural situations where the speaker moves freely and does not keep a frontal pose with relation to the camera. To handle these situations, we introduce a pose normalization b...

متن کامل

Speaker adaptation of an acoustic-to-articulatory inversion model using cascaded Gaussian mixture regressions

The article presents a method for adapting a GMM-based acoustic-articulatory inversion model trained on a reference speaker to another speaker. The goal is to estimate the articulatory trajectories in the geometrical space of a reference speaker from the speech audio signal of another speaker. This method is developed in the context of a system of visual biofeedback, aimed at pronunciation trai...

متن کامل

Gradient and Visual Speaker Normalization in the Perception of Fricatives

The role of visual information in speaker normalization of fricatives is examined by com paring listeners' responses to prototypical and non—prototypical male and female speech with their responses to audio—visual integrated stimuli (speech signal and face of speaker) in the fricatives [sj (" sod ") and [I] (" shod ") in Ohio English. Results from audio—only identification tasks suggest that th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999